Adaptive Statistical Language Modelling

نویسندگان

  • Raymond Lau
  • Victor Zue
چکیده

The trigram statistical language model is remarkably successful when used in such applications as speech recognition. However, the trigram model is static in that it only considers the previous two words when making a prediction about a future word. The work presented here attempts to improve upon the trigram model by considering additional contextual and longer distance information. This is frequently referred to in the literature as adaptive statistical language modelling because the model is thought of as adapting to the longer term information. This work considers the creation of topic specific models, statistical evidence from the presence or absence of triggers , or related words, in the document history ( document triggers ) and in the current sentence ( in-sentence triggers ), and the incorporation of the document cache , which predicts the probability of a word by considering its frequency in the document history. An important result of this work is that the presence of self-triggers , that is, whether or not the word itself occurred in the document history , is an extremely important piece of information. A maximum entropy (ME) approach will be used in many instances to incorporate information from different sources. Maximum entropy considers a model which maximizes entropy while satisfying the constraints presented by the information we wish to incorporate. The generalized iterative scaling (GIS) algorithm can be used to compute the maximum entropy solution. This work also considers various methods of smoothing the information in a maximum entropy model. An inportant result is that smoothing improves performance noticibly and that Good-Turing discounting is an effective method of smoothing. Thesis Supervisor: Victor Zue Title: Principal Research Scientist, Department of Electrical Engineering and Computer Science

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic-based mixture language modelling

This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of ...

متن کامل

Adaptive Compression-based Approach for Chinese Pinyin Input

This article presents a compression-based adaptive algorithm for Chinese Pinyin input. There are many different input methods for Chinese character text and the phonetic Pinyin input method is the one most commonly used. Compression by Partial Match (PPM) is an adaptive statistical modelling technique that is widely used in the field of text compression. Compression-based approaches are able to...

متن کامل

A maximum entropy approach to adaptive statistical language modelling

An adaptive statistical language model is described, which successfully integrates long distance linguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document’s history, we propose and use trigger pairs as the basic information bearing elements. This allows the...

متن کامل

Modelling and Analyzing Adaptive Self-assembly Strategies with Maude

Building adaptive systems with predictable emergent behavior is a challenging task and it is becoming a critical need. The research community has accepted the challenge by introducing approaches of various nature: from software architectures, to programming paradigms, to analysis techniques. We recently proposed a conceptual framework for adaptation centered around the role of control data. In ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994